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Abstract 

In this didactical note I review in depth the rationale for using 
generalised canonical distributions in quantum statistics. Particular 
attention is paid to the proper definitions of quantum entropy and 
quantum relative entropy, as well as to quantum state reconstruc- 
tion on the basis of incomplete data. There are two appendices in 
which I outline how generalised canonical distributions link to the 
conventional formulation of statistical mechanics, and how classical 
probability calculus emerges at the macroscopic level. 



*eniail: jochen.rau@q-info.org 



1 What is the problem? 



Reasoning with probabilities requires two basic algorithms: (i) a rule for 
updating probabilities when new evidence becomes available; and (ii) a pre- 
scription for determining the starting point, i.e., for constructing the initial 
probability distribution on the basis of — usually incomplete — prior knowl- 
edge. In classical statistical inference^ these two ingredients are furnished by 
Bayes rule and the maximum entropy principle, respectively. Bayes rule 



prob (hypothesis I data) 



prob(data|hypothesis) • prob(hypothesis) 
prob(data) 



stipulates how probabilities are to be updated in the light of new data, thus 
encapsulating the process of learning; whereas the maximum entropy prin- 
ciple provides the starting point for this learning process by assigning to a 
hypothesis i the prior probability 



1 

= - exp 



l...d 



(2) 



with partition function 



Z := X1*^^P 



a=l 



(3) 



Such a generalised canonical distribution maximises the classical entropy 



(4) 



under the normalisation condition 
independent constraints 



1 and the m {m < d) linearly 



. m 



(5) 



which are deemed the only prior information available. These constraints 
uniquely specify the m Lagrange parameters {A"}. 



-'^For excellent introductions to classical statistical inference see, e.g., the very readable 
book by Sivia [Siv96] and the seminal work of Jaynes [Jay89, Jay03]. 
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Quantum theory is a full-fledged probabilistic theory that despite its coun- 
terintuitive features shares with classical probability calculus a high degree 
of internal consistency [CFS02a, Rau07]. It should therefore be possible to 
erect the edifice of quantum statistical inference on the same two pillars. 
Indeed: 



1. There is a quantum analog of the classical Bayes rule [SBCOl]. This 
"quantum Bayes rule" pertains to experiments on exchangeable se- 
quences. An exchangeable sequence of length N can be thought of 
informally as a finite subsequence of an infinite sequence of systems 
whose order is irrelevant. It has a probability distribution of the de 
Finetti form [CFS02b] 

pW=/ dpprob(p)p^^, (6) 

Js(d) 

where the "meta-probability" prob(p) > is normalised to 

/ (ipprob(p) = 1 (7) 
Js{d) 

and the integration is over the manifold S{d) of probability distribu- 
tions. After ascertaining the outcome F^^^ oi K (K < N) trials (i.e., 
of some measurement performed on K constituents of the sequence) 
the posterior p^^~^^ for the remaining (A^ — K) constituents^ still has 
the de Finetti form, yet with a new meta-probability that has been 
updated according to the quantum Bayes rule 

where prob(r(^)|p) := prob(r(-^)|p®^) and 

prob(r(^)) := / dp' prob(r(^)|p') • Prob(p') . (9) 

Js{d) 



^not for all N constituents, because K constituents have been disturbed by quantum 
measurement 
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2. There is a quantum version of the maximum entropy prior, namely the 
generahsed canonical statistical operator [Bal91] 



p = - exp 



a=l 



with 



L 0=1 



Z :— tr |exp 

which maximises the von Neumann entropy 

S[p]:= -tr(plnp) 
under the constraints trp — 1 and 

{Ga)p := tr(pG'„) , a = 1 . . . m 



(10) 



(11) 



(12) 



(13) 



It is the foundations of the second pillar that shall be inspected more closely 
in this didactical note. 

Strictly speaking, in both the classical and the quantum case the use 
of the generalised canonical form (2) or (10) rests on two tacit assumptions 
which are often, but not always justified: (i) that the probability distribution 
is indeed normalised, JZiQi = ^ or trp = 1; and (ii) that "total ignorance" 
(i.e., the absence of any constraints. A" = 0) must correspond to a uniform 
distribution qi = 1/d or p = 1/d, with d := tr/ in the latter case. When 
these two assumptions are relaxed then the prior must have the more general 
form (in the quantum case) 



p = - exp 



a=l 



(14) 



with partition function 

Z :— tr |exp 



(ln(7-(ln(7)i/<i)-EA"G'« 



a=l 



(15) 



where the state cr represents "total ignorance", i.e., the — possibly non-uniform- 
starting distribution in the absence of any constraints (A" = 0). This more 
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general prior, rather than maximising the von Neumann entropy (12), min- 
imises the quantum relative entropy^ 

S{p\\a) := tr(plnp - plncr) (16) 

with respect to a under the constraints (13) and trp — l e (0,1] [Rus88]; 
it reduces to the famihar canonical form (10) whenever a — 1/d and l — 1. 
The generalisation of the classical case is completely analogous and involves 
minimising the classical relative entropy 

S{{qM{P^})■■=^:Q^^^^ ■ (17) 

i=l 

The need to consider such more general situations is particularly apparent 
in the case of classical continuous distributions, 

Qi 7r(x) , j dx , (18) 

i 

where — depending on the coordinates chosen — "total ignorance" need no 
longer correspond to a uniform distribution. The extremisation of relative 
rather than ordinary entropy, and hence the use of minimum relative entropy 
(MinREnt) rather than maximum entropy (MaxEnt) priors then ensures that 
any conclusions drawn from statistical inference are coordinate-independent 
[RM96]. 

In most textbooks on statistical mechanics the derivation of the quantum 
state (10) — or of the more general state (14) — proceeds heuristically by sim- 
ple analogy with the classical case. Yet at closer inspection the justification 
for using these states is far from obvious. Specifically, the quantum case 
presents two major, interrelated difficulties: 

• The von Neumann entropy (12) is not the only possible "quantisation" 
of the classical entropy (4) and hence not necessarily the quantity to be 
maximised; there are various other conceivable definitions of quantum 
entropy that reduce to Eq. (4) in the classical hmit. Likewise, in the 

^The following notation is not universal but the most commonly used in the modern 
literature on quantum information theory [NCOO]. Some authors, e.g. [Per95], also use 

S{a\p) ^ S{p\\a). 
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more general setting involving non-uniform ignorance distributions a 
there are various conceivable definitions of quantum relative entropy 
that all reduce to Eq. (17) in the classical hmit, and that hence are 
candidates for minimisation. Singling out the definitions (12) and (16) 
as the correct quantum analogs of Eqs. (4) and (17) is a non-trivial 
task [Och75, Don86, HP91]. 

• The use of the quantum state (10) or (14), respectively, implies the 
assertion that such a generalised canonical distribution is indeed most 
typical of the states allowed by the constraints (13); i.e., that whenever 
the expectation values (13) have been ascertained as measured aver- 
ages in sufficiently many trials on an exchangeable sequence, the poste- 
rior single-constituent state must be close to this generalised canonical 
state. While in the classical case this can be understood with the help of 
Jaynes' "entropy concentration theorem" [Jay79, Rau98] or the didacti- 
cal "monkey" and "kangaroo" arguments [Siv96],^ the extension to the 
quantum case — especially if the constraints pertain to non-commuting 
observables — is far from straightforward [BB87, BDD+99] . 

In the following two sections I will address these issues — the proper quantisa- 
tion of entropies and the alleged typicality of canonical quantum states — one 
by one. There will also be two short appendices linking these fundamental 
concepts to the conventional formulation of statistical mechanics and sketch- 
ing the emergence of classicality in the macroscopic limit, respectively. 

2 Quantum entropies 

The concepts of entropy and relative entropy play a pivotal role in mathemat- 
ical physics, both in statistical mechanics [Bal91] and in modern quantum 
information theory [NCOO, SWOO, VPJK97, Ved02]. Since the ordinary en- 
tropy S[p] can be expressed in terms of the relative entropy, 

S[p] = S[l/d]-S{p\\l/d) (19) 

'''The logic of these statistical arguments is similar to the textbook treatment of the 
classical canonical distribution as describing just one constituent of some larger micro- 
canonical ensemble. 
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— with calibration such that S'[|'(/') (■(/'I] = for arbitrary pure states ip — , the 
problem of quantising the former can be reduced to that of quantising the 
latter. Finding the proper formula for quantum relative entropy, in turn, can 
be approached from two different angles which I shall discuss separately in 
the following subsections. The first approach makes use of a central result 
of quantum information theory, the so-called quantum Stein lemma. The 
second approach is axiomatic in nature and starts from a set of consistency 
requirements for meta-probabilities. There is a third subsection in which I 
list some useful properties of quantum entropies. 



2.1 Quantum Stein lemma 

I start out by recapitulating how the concept of relative entropy emerges 
in classical statistical inference, following closely my earlier didactical note 
[Rau98]. Given a classical prior probability distribution {p,} for the results 
i = 1 . . .d, the probability that N trials will yield the — generally different — 
relative frequencies {/« = Ni/N} is 

prob({/,}|fa}, TV) = ^' pf^ ...p^'^ . (20) 

Here the second factor is the probability for one specific outcome with sample 
numbers {Ni}, whereas the first factor counts the number of all outcomes 
that give rise to the same set of sample numbers. With the definition (17) 
of classical relative entropy and the shorthand notations / = {fi}, p = {Pi}, 
as well as /®^, p®^ for composite distributions pertaining to N trials, one 
can also write 

prob(/®^|p«^) = prob(r^|/«^) exp[-NS{f\\p)] . (21) 
By virtue of Stirhng's formula 

x! A/2^x^e-^ (22) 
the pre-factor can be approximated by 

prob(/®^|/®^) ^ (27r7V)-('^-i)/2 {[M^O (iy-^'^-^^/^j (23) 
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and thus scales as N~^'^~^^^'^ with the number of trials. For N ^ oo the meta- 
distribution prob(/®^|p®'^) of frequencies becomes completely dominated by 
the exponential exp[— A^5'(/||p)]. Then the probability with which any given 
frequency distribution / is realized is essentially determined by the quantity 
S{f\\p): The larger this quantity, the less likely the frequency distribution 
is realized. Since S{f\\p) > with equality if and only if / = p, the meta- 
distribution becomes sharply peaked around f = p- 

The observed relative frequencies {/,} may be visualized as Cartesian co- 
ordinates of a point in a rf-dimensional vector space, where fi £ [0, 1] and the 
normalization condition J2i fi = ^ restrict the allowed points to some portion 
of a (rf — l)-dimensional hyperplane. In this hyperplane portion there is a 
unique point, namely / = p, at which the quantity S{f\\p) vanishes; every- 
where else S{f\\p) is strictly positive. It is possible to define new coordinates 
{xi . . . Xd-i} in the hyperplane such that (i) they are linear functions of the 
{fi}; (ii) the origin {x = 0) is at / = p; and (iii) in the vicinity of / = p, 

S{f{x)\\p) =ar^ + 0{r^) , a>0 , (24) 



where 



d-l 

E 



xi; 



(25) 



Frequency distributions whose S{f{x)\\p) exceeds some finite threshold AS" 
thus lie outside a hypersphere around f = p, the sphere's radius R being 
given by aR^ = AS. The probability that N trials will yield such a frequency 
distribution outside the hypersphere is 



pToh[S{f\\p)>AS\{d-l),N] 



-Nar'^ 



(26) 



Here {d — 1) is noted as the dimension of the hyperplane. The factors 
-j^ ^Yic integrand are due to the volume element, while the exponen- 
tials exp(— A^ar^) stem from the asymptotically dominant exponential factor 
in the meta-probability (21). Substituting t := Nar^ and using 



'd-l 




2 


Jo 



dtt 2 ^exp(— i) 



(27) 
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one may also write 

1 /"CO 

proh[Sif\\p)>AS\{d-l),N] = -j^ dtt^-'eM-t) ; (28) 

[ 2 \ 

which for large N {N ^ d/AS) can be approximated by 

prob[5(/|b) > AS\{d-l),N] ~ -J-^(iVA5)^-iexp(-ArA5) . (29) 



As the number of trials increases, this probability rapidly tends to zero 
for any finite AS*. As ^ oo, therefore, it becomes virtually certain that 
the measured frequency distribution / has S{f\\p) very close to zero, and 
hence coincides with the prior p. So not only does f = p represent the 
frequency distribution that is the most likely to be reahzed (cf. Eq. (21)); 
but in addition, as N increases, all other — theoretically allowed — frequency 
distributions become more and more concentrated near f = p: Frequency 
distributions other than f = p become highly atypical. Entropy fluctuations 
around f — p have decreasing size 

of order 0(1/A^), which due to their quadratic dependence on \f — p\ corre- 
spond to a frequency range \ f —p\ ~ 0(1/ VN). These results are known as 
the "entropy concentration theorem" [J ay 79]. 

In the asymptotic regime N ^ oo one may formulate the following hy- 
pothesis : "After A^ trials the measured relative frequencies are within 

0{l/\fN) around g." This frequency range corresponds to relative entropies 
up to A^" ~ 0{1/N) with respect to q. According to the entropy concentra- 
tion theorem the hypothesis is almost certainly true in the state q'^^, 

prob(rf lO ~ 1 -prob[5(/||g) > 1/N\{d-1),N] ~ 0(1) , (31) 

independent of N. Let p ^ q he some other distribution that is a finite 
distance away from q, \p — q\ ~ 0{1). In this different state the probability 
becomes 

prob(rf|p^^) = Y^p^ohirV) 

= ^ prob(r^ir^)exp[-Ar5(/|b)] . (32) 
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Since measurable relative frequencies are spaced at intervals of size 0{1/N), 
the summation is over 0([(l/A/iV)/(l/iV)]('^-^)) ~ 0{N^'^-^'^/'^) different mea- 
surable distributions; which combined with the asymptotics of Eq. (23) im- 
phes 

5: prob(r^|0~0(l) . (33) 

Thus the asymptotic dependence on N is determined entirely by the expo- 
nential function. Its argument may be expanded around f — q in powers 
of (/ ~ ?) ~ 0{1/VN) and for — > oo is dominated by the leading term 
[—NS{q\\p)]. In this hmit it is therefore 

Jim llnprob(rf|p«^) = -5(g||p) . (34) 

Due to the entropy concentration theorem any hypothesis which is 
true in in the asymptotic sense, prob(r^|g®^) ~ 0(1), must relate to 
a frequency range that encloses the 0(l/\/iV)-neighborhood of q. Hence 
pAT -J pAf^ prob(r'^|p®-^) > prob(r^|p®'^) for any p. Consequently the 
above hmit can also be expressed as 

lim 4 In inf |prob(r^|p®^) prob(r^|g®^) ~ 0(1)] = -S(q\\p) . (35) 

It is this result which allows one to build a bridge to the quantum case. In 
close analogy to the above classical case one can consider the set of hypotheses 
r-'^ — now represented by projection operators or, more generally, positive 
operators — that pertain to A^-partite sequences, and whose probabilities in 
some state p^^ are of order 0(1), i.e., always larger than some finite threshold 
1 — e(0<e<l) regardless of the number N of trials. The probability that 
such a hypothesis is found true in a different state cr®^ has a lower bound 
that satisfies 

lim ^In inf |tr(a®^r^) < < /, tr(p®^r^) > 1 - e| = -S{p\\a) 

(36) 

independently of the precise value of e, where S{p\\a) is the quantum relative 
entropy as defined in Eq. (16). This is the "quantum Stein lemma". Its 
proof is rather intricate due to the possibility that the observables used to 
prepare the prior and the observables being subsequently measured need not 
commute, and can be found elsewhere [HP91, ONOO, Hay06, Pet08]. 
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The limits (35) and (36) suggest a common interpretation of relative 
entropy in both the classical and quantum cases. The infimum on the left- 
hand side may be regarded, loosely, as the probability that despite prior 
preparation of a system in the state p or cr, a high-resolution measurement 
on replicas will yield an outcome that corresponds to a different state (or 
range of states within an (9(l/v/iV)-ncighborhood of) q or p, respectively. 
Asymptotically, with each additional trial the probability for such deviating 
evidence decreases further by a factor exp[— 5'(q'||p)] or 

prob(p|a) := exp[-5(p||a)] , (37) 

respectively. 

2.2 Axiomatic approach 

The axiomatic approach to quantisation takes the interpretation (37) not 
as a result but as its starting point. This approach formulates a number 
of consistency requirements either for the quantum relative entropy directly 
[Don86] or — as I shall prefer to do here — for the meta-probability prob(p|o"). 
These consistency requirements lead unequivocally to the proper definition 
of quantum relative entropy. 

The following four requirements for meta-probabilities are straightfor- 
ward: 

1. The probability for evidence p vanishes whenever it lies outside the 
support of the prior; or in reverse, 

prob(p|(T) > 4^ supp p C supp a . (38) 

2. Meta-probabilities are invariant under joint unitary transformations g, 

prob(^(p)|5(cT)) = prob(p|(j) . (39) 

3. Provided there is a hypothesis a such that both supp p C. a and 
supp o" C a, meta-probabilities do not change when the Hilbert space 
is reduced and distributions are restricted to a: 

prob {p\a I a\a) = prob(p|a) . (40) 
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4. For uncorrelated prior and posterior distributions the meta-probability 
factorises, 



prob(pA ® Pb|(7a <8) (Jb) = prob{pA\crA) • prob(pB|(7B) . (41) 

Two additional requirements are less obvious and require further motivation: 

5. The degrees of freedom that completely specify a probabihty distribu- 
tion can often be divided into some that are actually being prepared, 
measured or otherwise considered "relevant" , and the rest deemed "ir- 
relevant". By eliminating all information pertaining to the irrelevant 
degrees of freedom an arbitrary probability distribution can be reduced 
to its relevant part, a procedure known as coarse- graining. Two exam- 
ples for such coarse-graining are 

(i) discarding all information except for some selected probabilities 
{tr(pPj)}, where the {Pj} are projectors onto mutually orthogonal, 
collectively exhaustive (i.e., J2iPi — I) subspaces. The relevant 
part of a statistical operator p is then 

P.mP-E^^^'. ; (42) 

(ii) removing from the state pab of a composite system all correlations 
between the subsystems A and B, yielding the relevant part 

. (43) 

tryiePAB 

For a generic coarse-graining I shall write p Vp. The map V, act- 
ing on the manifold of probability distributions, must satisfy = V, 
leave the expectation values of relevant observables unchanged, and un- 
der these constraints maximise the resemblance of Vp with the (often, 
but not always uniform) "ignorance distribution", i.e., maximise the 
likelihood prob(Pp| ignorance). Specifying the relevant part Vp is thus 
tantamount to specifying a system's relevant degrees of freedom. When 
the latter are prepared in some state Vcr, the probability that subse- 
quent experimental evidence will correspond to a full state p, and that 
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data for the relevant degrees of freedom will correspond to a relevant 
part Vp, is according to Bayes rule 



prob(p, Vp\Va) = prob(p|7^p, Va) ■ ^Yoh{V p\Va) . (44) 

Yet on the left-hand side Vp is redundant because it is implied by p; 
and in the first factor on the right-hand side V(t is redundant because 
it is superseded by the subsequent deviating evidence Vp. These con- 
siderations motivate the "chain rule" 

^Yoh{p\Va) = prob(p|7'p) • ^Yoh{V p\Va) . (45) 

6. For the particular coarse-graining (42) the meta-probability prob(7'p|P(j) 
does not depend on the precise orientation or dimensionality of the sub- 
spaces associated with {Pi} but only on the respective sets of relevant 
probabilities {tr(pPj)} and {tr((TPj)}; so 

prob(P{p,}p|P{P,}a) = prob({tr(pP,)}|{tr(aA)}) , (46) 
where the right-hand side is the classical counterpart of definition (37). 

By virtue of definition (37) the above six requirements for meta-probabilities 
translate into properties of the quantum relative entropy: 



1. Range: 



S{ph) \ :suppp Csuppa _ 

= +00 : otherwise ^ ' 



2. Unitary invariance: 

S{g{p)\\g{a)) ^ S{p\\a) . (48) 

3. Invariance under Hilbert space reduction: If both supp p Q a and 
supp a C. a then 

S{p\a\\a\a) = S{p\\a) . (49) 



4. Additivity: 

S{pA® PB\\<yA<^ (Jb) = S{pA\\aA) + S{pB\\aB) . (50) 
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5. "Pythagorean theorem" [Pet08]: 

S(p\\Va) = S(p\\Vp) + S(Vp\\Va) . (51) 

6. Quantum- classical interface: 

S(V{P,yp\\V{P^ya) = S(MpP,)}\\{tr(aP^}) (52) 

with the right-hand side given by definition (17). 

According to relation (19) the ordinary entropy can always be expressed 
in terms of the relative entropy. I will now show that the converse is also true. 
To begin with, if p and a commute then they have a joint eigenbasis, i.e., 
there is a set of projectors {Pi} such that both V{Pf}p — p and V{p^}a — a. 
Inserting these into Eq. (52) yields 

[p,a]=0 ^ S{p\\a) = S{{tT{pP,)}\\{tT{aP,)}) . (53) 

It is possible to extend the Hilbert space from dimension to a larger di- 
mension D, D > d, and to define in this extended Hilbert space new states 

which still commute, and where now J2iPi — Id- Under such an extension 
relevant probabilities are conserved in the sense that 

tr(pPi) = tr(pPi) , tr(aP,) = ii{aP,) ; (55) 

which in combination with Eq. (53) — applied to both the original and the 
extended states — implies the conservation of relative entropy 

S{p\\a) ^ S{p\\~a) . (56) 

For sufficiently large (possibly infinite) D the new dimensionahties {trPj} 
can be chosen such that to arbitrary precision 

tr(aPi) = iiPi/D (57) 

and thus 

a=l/D . (58) 
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For this particular choice of dimensions the relative entropy reduces to a 
difference of ordinary entropies: 

S{p\\a)^S[l/D]-S[~p] . (59) 

In the more general case where p and a need not commute one can make 
use of the Pythagorean theorem (51) with coarse-graining V{p^}, where the 
{Pj^} project onto the eigenspaces of cr, to obtain both 

S{p\\a) = S{p\\V{p^.}p) + S{V^p.}p\\a) (60) 

and 

S{p\\l/d) = S{p\\V{P^.}p) + SiV{p^.yp\\l/d) . (61) 

Here we have used V{p^}a — a and 'P^p^}{l / d) — 1/d, respectively. Combin- 
ing these two results into 

Sip\\a) = SiV^p.yp\\a) + Sip\\l/d) - SiV^p.yp\\l/d) , (62) 

the right-hand side now contains only relative entropies between states that 
commute. Hence one may proceed with the same Hilbert space extension as 
above to express again the relative entropy in terms of ordinary entropies, 

S{p\\a)=S[l/D]-S[V{;^yp] + S[V{P^.yp]-S[p] , (63) 

even when p and a do not commute. So ultimately one can go full circle to 
reduce the problem of quantising relative entropy back to that of quantising 
ordinary entropy. 

As for the ordinary quantum entropy, the following four properties are 
implied by those of the quantum relative entropy: 

1. Unitary invariance: The invariance (48) of relative entropy and of the 
uniform distribution, g{l/d) = 1/d, together entail 

S[9{p)]^S\p] . (64) 

2. Invariance under Hilbert space reduction: If supp p a then PpP = p, 
where P projects onto a; and (provided p is normalised) the associated 
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coarse-graining (42) yields V{pj-p}p = P/tiP. With this particular 
coarse-graining and a = 1/d the Pythagorean theorem (51) reads 

S{p\\l/d)^S{p\\P/trP) + S{P/trP\\l/d) . (65) 

By invariance (49) of the relative entropy under Hilbert space reduction 
it is 

S{p\\P/tiP) = Sip\a\\l/tTP) ; (66) 

so Eq. (65) combined with the calibration S[p] = S[p\a\ = for pure p 
implies 

S{P/trP\\l/d) = S[l/d] - S[l/tTP] , (67) 
and for general p 

S[p\a]^S[p] . (68) 

3. Additivity: The direct product of two pure (constituent) states is again 
a pure (composite) state. Likewise the direct product of two uniform 
(constituent) distributions gives the uniform (composite) distribution. 
The additivity (50) of relative entropy applied to pure p and uniform 
a then yields 

S[lAxB/dAxB] = S[U/dA] + S[lB/dB] , (69) 
and applied to arbitrary p 

S[pa(^Pb] = S[pa]+S[pb] . (70) 

4. Subadditivity: With the decorrelator (43) as coarse-graining and a — 
1/d the Pythagorean theorem (51) reads 

S(pAB\\l/d)^S(pAB\\pA®PB)+S(pA®PB\\l/d) . (71) 

As the relative entropy S{pab\\pa ® Pb) is always positive, this implies 

S[pab]<SIpa] + S[pb] . (72) 

These four properties, combined with some natural assumptions regarding 
good mathematical behavior (continuity, extendibility to infinite dimension) , 

determine the ordinary quantum entropy uniquely: Up to a numerical factor 
it must coincide with the von Neumann entropy (12) [Och75]. Inserting this 
result into (63) then yields the quantum relative entropy (16), Q.E.D. 
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2.3 Some properties 

For completeness I note some properties of the two quantum entropies that 
are not contained in, and hence are consequences of, the above axioms. More 
properties can be found in Refs. [AL70, Weh78, Thi83, Rus02]. 

i. The relative entropy vanishes if and only if distributions are equal, 

S{p\\a) = ^ p = (7 . (73) 

ii. It scales with the normalisation of its arguments, 

S{ap\\aa) = aS{p\\a) V a > ; (74) 

and 

iii. is quasi- linear in its first argument in the sense that for t & [0, 1] 

S{tp+{l-t)i,\\a) = tS{p\\a) + {l-t)S{p\\a) 

-{S[tp + (1 - t)i,] - tS[p] - (1 - t)S[i,]} 

(75) 

where the expression in {•} does not depend on a. 

iv. Moreover, the expression in {•} is non- negative since the ordinary en- 
tropy is strictly concave, 

S[tp + (1 - t)ii] > tS[p] + (1 - t)S[ji] , (76) 

with equality if and only if p = p,. 

V. The relative entropy is in general not symmetric, 

S{p\\a)^S{a\\p) , (77) 

which is plausible from its definition (37) via meta-probabilities: A 
prior with broad support may yield evidence with narrow support, but 
not vice versa. 
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vi. However, for two infinitesimally close states with identical normalisa- 
tion the relative entropy is approximately quadratic in 5p, 

tr(5p) = ^ S{p + 5p\\p)r.0{{5pf) , (78) 

and hence approximately symmetric. Thus the relative entropy endows 
any submanifold S{d)\^ of states normalised to trp — l & {0, 1] with a 

positive definite metric, rendering it a Riemannian manifold [BAR86]. 
The volume element associated with this Riemannian metric will yield 
the proper integration measures in the de Finetti representation (6) 
and quantum Bayes rule (9), provided integration is restricted to a 
submanifold S{d)\^. 

vii. Again for states with identical normalisation t, any trace-preserving 
completely positive (TP-CP) map $ can only decrease, but never in- 
crease their relative entropy ( "monotonicity" ) [Lin75] : 

S{^{p)ma)) < S{p\\a) Vp,ae5(d)|, . (79) 

viii. Finally, for a composite system the ordinary quantum entropy is not 
only boTinded from above by inequality (72) but also bounded from 
below by [AL70] 

\S[pa]-S[pb]\<S[pab\ . (80) 

This lower bound has no classical analog. It implies in particular 
S[pa\ = S[pb] whenever pab is pure.^ 

3 Quantum state reconstruction with incom- 
plete data 

In this section I turn to the second issue raised in the introduction, namely 
the rationale for employing the principle of minimum relative entropy when 
reconstructing quantum states on the basis of incomplete data. 

I begin with some notation and terminology. The subspace span{/, Ga} 
of Liouville space shall be termed the level of description [RM96].^ Elements 

^But it does not necessarily imply S[pa] = S[pb] = 0. 
^Sometimes this is also called the "observation level" [FS90]. 
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of the state manifold S{d) which satisfy the m + 1 constraints (/) = l and 
{{Go) = ga,o, = 1 • • • m} form a submanifold that shall be denoted hj S{d)\^^g. 
That point on S{d)\^^g which minimises the relative entropy with respect to 
some reference state cr is the MinREnt distribution (14); it shall be denoted 
by fi'^g G S{d)\^^g. There is a coarse-graining operation VJq that maps an 
arbitrary state p to 

i.e., that minimises relative entropy with respect to a while retaining com- 
plete information about the selected level of description.^ This map has a 
number of useful properties: 

i. It leaves the reference state a unchanged, 

VIg^-^ . (82) 

ii. It is idempotent, — V. Moreover, even for a; 7^ cr it is 

'PI,gVIg = VIg ; (83) 

and for arbitrary extensions span{/, Ga} — > span{/, Ga, Fb} of the level 
of description 

V1,g,fV!,g-VIgV1,g,f-VIg ■ (84) 

iii. It satisfies the Pythagorean theorem (51), 

S{p\\Vlacy) = S{p\\VIgP)+S{VIgP\\Kg<^) , (85) 
which for a; = (T simplifies to 

S{p\\a)=S{p\\pl,^^^^G),)+S{p\:)^,^G)M) ■ (86) 

iv. It encompasses the previously used coarse-grainings (42) and (43) as 
special cases. They correspond to the choice uj = 1/d and level of 
description span{Pj} or spanjX^ ® IbiIa® ^b), respectively. 



''In the special case a = 1/d this map is known as the Kawasaki-Gunton projector 
[KG73, FS90, RM96]. 
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V. Finally, VfQ is a trace-preserving and positive, but generally not a 
linear map. In those special cases where it is linear (e.g., Eq. (42) but 
not (43)) it constitutes a TP-CP map; then by monotonicity (79) it 
reduces relative entropy: 

S{{VlGhnP\\{VlGM < S{p\\a) . (87) 

To what extent such monotonicity also holds for general, non-linear 
coarse-graining operations is an open problem worth investigating.^ 

The argument for describing macroscopic systems with MinREnt distri- 
butions of the form (14) now goes as follows. In a macroscopic world the 
purpose of an effective description is to infer from few known averages l and 
{qo}, ascertained in N trials on an exchangeable sequence, the expected val- 
ues of other — equally macroscopic — averages {/{,} to be measured in further 
M trials; i.e., to determine (in vector notation) 

f:^ jdf[...dnwoh{Tf,\Tl)f' . (88) 

Here, following the logic of my earlier discussion of the quantum Stein lemma 
(Section 2.1), I have introduced hypotheses F that pertain to exchangeable 
sequences of length M or N , respectively Yet rather than to tomographic 
evidence for some complete state p these hypotheses now refer only to a 
small number of macroscopic averages that together do not specify a state 
completely. In particular the hypothesis Tf stipulates: "After M trials the 
measured averages for {F^} are within an 0(l/-\/M)-range around {/&}"; 
and hkewise for F^^. For large M its probability of being true in a state 
— ^jigpg j^qI necessarily {Fi,)^ — fi, — is according to the asymptotics 

(36) 

prob(Ff |a) - / dp ex.p[-MS{p\\a)] , (89) 

Js{d)\f 

up to possibly a pre-factor that accounts for overlaps of 0(l/\/M)-neighborhoods 
of the various p's but that is independent of a. Inserting this likelihood func- 
tion into the marginalisation 

prob(Fjf |F^^) = 1^^^^ daprob(F}f |a)prob(a|F^^) (90) 

^This may become part of a broader effort to move beyond CP maps in the study of 
macroscopic dynamics [SS05, Maj07]. 
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and replacing 

[df[...df;f dpf' fdp{F), (91) 

J JS{d)\fi JS{d) 

one obtains 

/ ~ / daprob(a|r^,) / dp exp[-M5(p||a)] (F), . (92) 

For sufficiently large M the exponential becomes sharply peaked around p — 
a, which (modulo normalisation) leads to 

/ ^ / rfaprob(a|r^^) (F). = tr {{a), J) (93) 

with 

(<7),,:= [ daproh{a\r^J-a . (94) 

JS(d) 

I shall argue that for large this effective single-constituent state has the 
MinREnt form (14). 

The posterior prob((T|r^^) is related to the likehhood function prob(r^^|(T) 
via the quantum Bayes rule (8), 

\rN\ prob(r^ |a)prob(a) 
prob((7|r. J = ^Irr^M^ (95) 



prob(r5) 



with 



prob(r^J := / rfa'prob(rf>')prob(aO . (96) 

J S{d) 

The likelihood function in turn is given by the integral (89) with replacements 
M ^ N and f ^ L,g. With the help of the Pythagorean theorem (86) its 
integrand can be factorised into 

eM-NS{p\\a)] = eM-NS{p\\pi:j ■ exp[-7V5(//^J|<7)] (97) 

and the second factor, which no longer depends on p, be taken out of the 
integral. Applying to the argument S{p\\p^g) of the other exponential the 
quadratic approximation (78) the remaining integration is over a multivariate 
Gaussian and — with the integration measure properly chosen to correspond 
to the Riemannian metric induced by the relative entropy — yields a result 
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that is independent of a. Thus the ci-dependence of the hkehhood function is 
determined entirely by the factor taken out of the integral For large N this 
factor becomes sharply peaked around cr = ^1g. One can then effectively 
impose a G S{d)\^^g and restrict the integration in Eqs. (94) and (96) to 
»S(ci)|^g, resulting finally in 

^^'^'■'9^ Lr., dapToh{a)l,ga (98) 
with effective meta-probability on S{d)\^^g 

While the prior prob((T) is generally not known, it is fair to assume that 
it is isotropic on S{d) in the sense that it is an arbitrary function of only the 
distance, and hence relative entropy, with respect to some (usually highly 
symmetric) reference state (Tq: 

pToh{a) ^ f[S{a\\ao)] . (100) 

This reference state coincides with the effective single-constituent state prior 
to any measurement, 

o-Q = (cr)prior := j^^^^ rfcrprob(cr) a , (101) 

and hence represents any prior knowledge one may have about the system. 
Applying the Pythagorean theorem (86) to the argument of /, 

S{a\\ao) = S{a\\^iZ) + S{^Z\\ao) , (102) 

the posterior single-constituent state (98) becomes (modulo normahsation) 

(a),,^/ daf[S{a\\^iZ) + S{i,:^g\\ao)]-a . (103) 

On the reduced manifold S{d)\i,^g the function / is still isotropic, this time 
around //^°; whence indeed. 
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It is this result which justifies the effective description of single constituents 
by means of MinREnt distributions [BDD+99]. In contrast to many textbook 
presentations we arrived at this result with purely statistical arguments and 
without any reference to chaoticity, ergodicity or other dynamical properties 
of the system. 

One should take note, however, that in the asymptotics (98) the effective 
meta-probability prob(cr)|t^g on S{d)\i^^g might still be broad, which in turn 
may render the effective single-constituent state insufficient to account for all 
macroscopic properties of interest; in particular it is generally not permitted 
to infer from the measured averages the stronger result 

^^^^ ~ iK!X^ 

for the full sequence [SBCOl]. This leads to the question whether a given 
level of description span{/, Ga} actually suffices to characterise a system's 
macrostate. It is here that physics comes into play: The appropriate choice 
for the level of description depends on the physical problem at hand and 
must take into account 

• the desired accuracy; 

• the observables used for preparation and subsequent measurement, re- 
spectively; and 

• in case the task is to describe a system's macroscopic dynamics, the 
hierarchy of time scales and hence the (possibly extended) set of "slow" 
degrees of freedom to be accounted for in a Markovian transport equa- 
tion [RM96]. 

Of course, a level of description which is so large that it encompasses the 
totality of observables that may ever be measured macroscopically or impact 
the system's macrodynamics will by definition suffice; but such a choice will 
likely render the macroscopic description unduly complicated and have little 
predictive value. The question is therefore whether it is feasible to contract 
this maximal level of description to a smaller, manageable one. If so, this 
in itself represents a non-trivial statement about the system's macroscopic 
properties. 

The practical task amounts to checking whether the generalised canonical 
states (14) associated with a larger level of description span{J, Ga, Fb} and 
with the contracted level of description span{/, Ga}, respectively, coincide 
within some prescribed error margin. As both levels of description contain 
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the unit operator /, both states will lie on the same reduced state manifold 
S{d)\c. By Eq. (78) this manifold is endowed with a Riemannian metric, 
so the distance between two states is given approximately by their relative 
entropy. Using the Pythagorean theorem (86) with p — p-l^gj this distance 
can be expressed as 

which in the special case a = l/d reduces further to the difference of ordinary 
entropies {S[iJ,y^] — Slfil^^j^]). Thus the feasibility of a level contraction can 
be assessed simply by comparing entropies associated with the original and 
contracted levels of description, respectively. Whether or not an entropy dif- 
ferential must be considered significant — and hence level contractions must 
stop — depends both on the number of trials and on the accuracy of the 
measurements performed. By Eq. (30) an entropy differential is not sig- 
nificant as long as it is within the 0(l/A^)-range of statistical fiuctuations. 
Yet even when the entropy differential is outside this range it may still be 
considered insignificant provided 

sK,j\W) - siKJW) < ^(K,,/IIK,,/+A/) , (106) 

where A/ is the finite accuracy with which the (presumably redundant) aver- 
ages {fb} can be measured. The right-hand side is approximately quadratic 
in A/; an explicit formula will be given in Appendix A (Eq. (119). 

Successive contractions eventually lead to a smallest possible level of de- 
scription which cannot be contracted further without a significant increase in 
entropy. It is this smallest possible level of description which best captures 
the essential features of, and hence furnishes the most suitable theoretical 
model for, a system's macrostate.^ 



lucid illustration of such an iterative diagnosis can be found in Jaynes' analysis 
of Wolf's die data [J ay 79] as recounted in my earlier didactical note [Rau98] . There the 
iteration proceeds in the opposite direction, starting from a minimal level of description 
and successively enlarging it. 
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A Statistical mechanics 

In this appendix I provide some link between the above conceptual discus- 
sions and the conventional formulation of statistical mechanics. 

Once the appropriate level of description is established, the macrostate of 
a physical system can be characterised completely by the m + 1 expectation 
values {<-, Qa} of the relevant observables. The associated MinREnt state (14) 
has an (ordinary) entropy 



SKJi] + ((In - (In a)^/,) ^InZ+Y. A«^„/. , (107) 

a=l 

which in the special case a — 1/d and l — 1 reduces to the familiar relation 

m 

S{i^[/j]=lnZ + Y,^''9a . (108) 

a=l 

As 

lnZ = -ga/i , (109) 



9A» 

infinitesimal variation yields 



rn 

diSlKJi] + ((Ina)^.^/, - (Ina)vd)} = E ^'^4 W4 (HO) 



a=l 



with the familiar special case 



m 



^'^ra = EA"^^« • (111) 

a=l 

Upon infinitesimal variation of Lagrange parameters or of the state's nor- 
malisation arbitrary expectation values change according to 



d{A),.^ = {{A),Ji)di - Y^iSGa, A),.JX'^ , (112) 

a=l 

where 

SGa := Ga - {9a/i)I (113) 
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and (; )p is the canonical correlation function with respect to the state p 



{B-A)p:= Tdi/tr [p^SV"''^] ■ (114) 

J 

Considering the special case A — Gb and du — Q yields 

m 

dg.^-Y.dX'Cab (115) 

with correlation matrix 

C„,:=(5G„;5G,),.^ = -(||) . (116) 

Since the canonical correlation function constitutes a scalar product, this 
correlation matrix is symmetric and positive. 

The relative entropy of two different macrostates on the same level of 
description reads 

m 

Sil^lMh) = El^" - ^'')9a + ^(InZ - InZ) - Lhi{l/i) . (117) 

For neighboring states the right-hand side can be expanded in powers of 
^9a '■— 9a ~ 9a ^-^d Al i— L — i. Assuming identical normalisation (At = 0) 
and using Eq. (109) as well as 

the first non-vanishing term is of second order: 

S{lilM,,+A,) a^AA^AA" ^ - E {C-YAgaAg, > . (119) 

a,b=l a,b=l 

If calculated on the extended level of description span{/, Ga, -Ffe}, this result 
provides the right-hand side of Eq. (106). To lowest order in Ag the corre- 
lation matrix C may be evaluated in either of the two neighboring states. 

How the above general results lead to familiar thermodynamic relations 
is discussed in a separate didactical note [Rau98]. 
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B Emergence of classicality 



In this appendix I discuss briefly the fact that despite the quantum nature 
of the underlying constituents reasoning at the macroscopic level is approx- 
imately classical, in the following sense. Measuring macroscopic averages in 
the fashion described in Section 3 constitutes a learning process that can be 
encapsulated in a macroscopic version of the quantum Bayes rule (8), 

, probfrfjrf ) ■ probfrf ) , , 

prob(rf irf,) ^ ^ i^ilvl) ; (120) 

it follows in a straightforward manner from the marginalisation (90) and 
application of the quantum Bayes rule (8) to both factors in the integrand. 
When the numbers M, N of trials become large then there is the even stronger 

prob(rf|rf^).prob(rf^)~prob(r5+/) . (121) 
This has the form of the classical product rule 

prob(a|6) • prob(6) = prob(a n h) (122) 

and thus indicates the emergence of classical probability calculus in the 
macroscopic limit. 

The proof of this asymptotic product rule proceeds as follows. In the 
marginalisation (90) one can apply the quantum Bayes rule (8) to the second 
factor in the integrand to obtain 



prob(rf |r^^)prob(r^^) = /^^^^daprob(rf |a)prob(r^»prob(a) . 

(123) 

The first two factors on the right-hand side are given asymptotically by Eq. 
(89), and the ensuing product of exponentials can be re-expressed by quasi- 
linearity (75) to give (modulo normalisation) 



prob(rf|a)prob(r^» ^ Ldp J duj D 



M,N 



X exp 



-(M + N)S{ ^ p + ^ u 



a 
(124) 
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with 



D 



M,N 



:— exp 



■{M + N) 

M 



M 



M + N 



M + N 
S[p]- 



P + 



N 



M + N 



N 



M + N 



S[u] 



(125) 



By strict concavity (76) this newly defined function is for large M, N sharply 
peaked around p = uj. The latter equality is possible only if p lies on the 
reduced manifold S{d)\^^gj — S{d)\f fl <S(d)| (,,(,, which I assume to be non- 
empty. Then one may replace in the asymptotic limit (modulo normalisation) 



JS{d)\; JS 



5(d) k,9 



Is 



and 



whence 



S 



M 



N 



M + N"^ M + N 



a 



S{P\W) 



prob(r}^|a) .prob(r:y» ~ prob(rS^|<7) 



dp 



(126) 



(127) 



(128) 



which in turn by Eqs. (96) and (123) implies the asymptotic product rule 
(121), Q.E.D. 
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